skip to main content


Search for: All records

Creators/Authors contains: "Mitchell, Cassie S."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Diabetic kidney disease (DKD) is the leading cause of end-stage renal disease worldwide. This study’s goal was to identify the signaling drivers and pathways that modulate glomerular endothelial dysfunction in DKD via artificial intelligence-enabled literature-based discovery. Cross-domain text mining of 33+ million PubMed articles was performed with SemNet 2.0 to identify and rank multi-scalar and multi-factorial pathophysiological concepts related to DKD. A set of identified relevant genes and proteins that regulate different pathological events associated with DKD were analyzed and ranked using normalized mean HeteSim scores. High-ranking genes and proteins intersected three domains—DKD, the immune response, and glomerular endothelial cells. The top 10% of ranked concepts were mapped to the following biological functions: angiogenesis, apoptotic processes, cell adhesion, chemotaxis, growth factor signaling, vascular permeability, the nitric oxide response, oxidative stress, the cytokine response, macrophage signaling, NFκB factor activity, the TLR pathway, glucose metabolism, the inflammatory response, the ERK/MAPK signaling response, the JAK/STAT pathway, the T-cell-mediated response, the WNT/β-catenin pathway, the renin–angiotensin system, and NADPH oxidase activity. High-ranking genes and proteins were used to generate a protein–protein interaction network. The study results prioritized interactions or molecules involved in dysregulated signaling in DKD, which can be further assessed through biochemical network models or experiments.

     
    more » « less
    Free, publicly-accessible full text available April 1, 2025
  2. Background: Amyloid-β plaques (Aβ) are associated with Alzheimer’s disease (AD). Pooled assessment of amyloid reduction in transgenic AD mice is critical for expediting anti-amyloid AD therapeutic research. Objective: The mean threshold of Aβ reduction necessary to achieve cognitive improvement was measured via pooled assessment (n = 594 mice) of Morris water maze (MWM) escape latency of transgenic AD mice treated with substances intended to reduce Aβ via reduction of beta-secretase cleaving enzyme (BACE). Methods: Machine learning and statistical methods identified necessary amyloid reduction levels using mouse data (e.g., APP/PS1, LPS, Tg2576, 3xTg-AD, control, wild type, treated, untreated) curated from 22 published studies. Results: K-means clustering identified 4 clusters that primarily corresponded with level of Aβ: untreated transgenic AD control mice, wild type mice, and two clusters of transgenic AD mice treated with BACE inhibitors that had either an average 25% “medium reduction” of Aβ or 50% “high reduction” of Aβ compared to untreated control. A 25% Aβ reduction achieved a 28% cognitive improvement, and a 50% Aβ reduction resulted in a significant 32% improvement compared to untreated transgenic mice (p < 0.05). Comparatively, wild type mice had a mean 41% MWM latency improvement over untreated transgenic mice (p < 0.05). BACE reduction had a lesser impact on the ratio of Aβ42 to Aβ40. Supervised learning with an 80% –20% train-test split confirmed Aβ reduction was a key feature for predicting MWM escape latency (R2 = 0.8 to 0.95). Conclusions: Results suggest a 25% reduction in Aβ as a meaningful treatment threshold for improving transgenic AD mouse cognition.

     
    more » « less
    Free, publicly-accessible full text available February 28, 2025
  3. The ability to translate Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) into different modalities and data types is essential to improve Deep Learning (DL) for predictive medicine. This work presents DACMVA, a novel framework to conduct data augmentation in a cross-modal dataset by translating between modalities and oversampling imputations of missing data. DACMVA was inspired by previous work on the alignment of latent spaces in Autoencoders. DACMVA is a DL data augmentation pipeline that improves the performance in a downstream prediction task. The unique DACMVA framework leverages a cross-modal loss to improve the imputation quality and employs training strategies to enable regularized latent spaces. Oversampling of augmented data is integrated into the prediction training. It is empirically demonstrated that the new DACMVA framework is effective in the often-neglected scenario of DL training on tabular data with continuous labels. Specifically, DACMVA is applied towards cancer survival prediction on tabular gene expression data where there is a portion of missing data in a given modality. DACMVA significantly (p << 0.001, one-sided Wilcoxon signed-rank test) outperformed the non-augmented baseline and competing augmentation methods with varying percentages of missing data (4%, 90%, 95% missing). As such, DACMVA provides significant performance improvements, even in very-low-data regimes, over existing state-of-the-art methods, including TDImpute and oversampling alone.

     
    more » « less
    Free, publicly-accessible full text available January 1, 2025
  4. Background: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. Methods: The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. Results: An interpretable decision tree classified the risk of infection as either “high risk” or “low risk” in pediatric ALL (n = 580) and AML (n = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). Conclusions: The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL.

     
    more » « less
    Free, publicly-accessible full text available March 1, 2025
  5. This work presents SeizFt—a novel seizure detection framework that utilizes machine learning to automatically detect seizures using wearable SensorDot EEG data. Inspired by interpretable sleep staging, our novel approach employs a unique combination of data augmentation, meaningful feature extraction, and an ensemble of decision trees to improve resilience to variations in EEG and to increase the capacity to generalize to unseen data. Fourier Transform (FT) Surrogates were utilized to increase sample size and improve the class balance between labeled non-seizure and seizure epochs. To enhance model stability and accuracy, SeizFt utilizes an ensemble of decision trees through the CatBoost classifier to classify each second of EEG recording as seizure or non-seizure. The SeizIt1 dataset was used for training, and the SeizIt2 dataset for validation and testing. Model performance for seizure detection was evaluated using two primary metrics: sensitivity using the any-overlap method (OVLP) and False Alarm (FA) rate using epoch-based scoring (EPOCH). Notably, SeizFt placed first among an array of state-of-the-art seizure detection algorithms as part of the Seizure Detection Grand Challenge at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP). SeizFt outperformed state-of-the-art black-box models in accurate seizure detection and minimized false alarms, obtaining a total score of 40.15, combining OVLP and EPOCH across two tasks and representing an improvement of ~30% from the next best approach. The interpretability of SeizFt is a key advantage, as it fosters trust and accountability among healthcare professionals. The most predictive seizure detection features extracted from SeizFt were: delta wave, interquartile range, standard deviation, total absolute power, theta wave, the ratio of delta to theta, binned entropy, Hjorth complexity, delta + theta, and Higuchi fractal dimension. In conclusion, the successful application of SeizFt to wearable SensorDot data suggests its potential for real-time, continuous monitoring to improve personalized medicine for epilepsy.

     
    more » « less
    Free, publicly-accessible full text available August 1, 2024
  6. Chronic myeloid leukemia (CML) is treated with tyrosine kinase inhibitors (TKI) that target the pathological BCR-ABL1 fusion oncogene. The objective of this statistical meta-analysis was to assess the prevalence of other hematological adverse events (AEs) that occur during or after predominantly first-line treatment with TKIs. Data from seventy peer-reviewed, published studies were included in the analysis. Hematological AEs were assessed as a function of TKI drug type (dasatinib, imatinib, bosutinib, nilotinib) and CML phase (chronic, accelerated, blast). AE prevalence aggregated across all severities and phases was significantly different between each TKI (p < 0.05) for anemia—dasatinib (54.5%), bosutinib (44.0%), imatinib (32.8%), nilotinib (11.2%); neutropenia—dasatinib (51.2%), imatinib (29.8%), bosutinib (14.1%), nilotinib (14.1%); thrombocytopenia—dasatinib (62.2%), imatinib (30.4%), bosutinib (35.3%), nilotinib (22.3%). AE prevalence aggregated across all severities and TKIs was significantly (p < 0.05) different between CML phases for anemia—chronic (28.4%), accelerated (66.9%), blast (55.8%); neutropenia—chronic (26.7%), accelerated (63.8%), blast (36.4%); thrombocytopenia—chronic (33.3%), accelerated (65.6%), blast (37.9%). An odds ratio (OR) with 95% confidence interval was used to compare hematological AE prevalence of each TKI compared to the most common first-line TKI therapy, imatinib. For anemia, dasatinib OR = 1.65, [1.51, 1.83]; bosutinib OR = 1.34, [1.16, 1.54]; nilotinib OR = 0.34, [0.30, 0.39]. For neutropenia, dasatinib OR = 1.72, [1.53, 1.92]; bosutinib OR = 0.47, [0.38, 0.58]; nilotinib OR = 0.47, [0.42, 0.54]. For thrombocytopenia, dasatinib OR = 2.04, [1.82, 2.30]; bosutinib OR = 1.16, [0.97, 1.39]; nilotinib OR = 0.73, [0.65, 0.82]. Nilotinib had the greatest fraction of severe (grade 3/4) hematological AEs (30%). In conclusion, the overall prevalence of hematological AEs by TKI type was: dasatinib > bosutinib > imatinib > nilotinib. Study limitations include inability to normalize for dosage and treatment duration.

     
    more » « less
    Free, publicly-accessible full text available September 1, 2024
  7. Free, publicly-accessible full text available June 4, 2024
  8. Multiple studies have reported new or exacerbated persistent or resistant hypertension in patients previously infected with COVID-19. We used literature-based discovery to identify and prioritize multi-scalar explanatory biology that relates resistant hypertension to COVID-19. Cross-domain text mining of 33+ million PubMed articles within a comprehensive knowledge graph was performed using SemNet 2.0. Unsupervised rank aggregation determined which concepts were most relevant utilizing the normalized HeteSim score. A series of simulations identified concepts directly related to COVID-19 and resistant hypertension or connected via one of three renin–angiotensin–aldosterone system hub nodes (mineralocorticoid receptor, epithelial sodium channel, angiotensin I receptor). The top-ranking concepts relating COVID-19 to resistant hypertension included: cGMP-dependent protein kinase II, MAP3K1, haspin, ral guanine nucleotide exchange factor, N-(3-Oxododecanoyl)-L-homoserine lactone, aspartic endopeptidases, metabotropic glutamate receptors, choline-phosphate cytidylyltransferase, protein tyrosine phosphatase, tat genes, MAP3K10, uridine kinase, dicer enzyme, CMD1B, USP17L2, FLNA, exportin 5, somatotropin releasing hormone, beta-melanocyte stimulating hormone, pegylated leptin, beta-lipoprotein, corticotropin, growth hormone-releasing peptide 2, pro-opiomelanocortin, alpha-melanocyte stimulating hormone, prolactin, thyroid hormone, poly-beta-hydroxybutyrate depolymerase, CR 1392, BCR-ABL fusion gene, high density lipoprotein sphingomyelin, pregnancy-associated murine protein 1, recQ4 helicase, immunoglobulin heavy chain variable domain, aglycotransferrin, host cell factor C1, ATP6V0D1, imipramine demethylase, TRIM40, H3C2 gene, COL1A1+COL1A2 gene, QARS gene, VPS54, TPM2, MPST, EXOSC2, ribosomal protein S10, TAP-144, gonadotropins, human gonadotropin releasing hormone 1, beta-lipotropin, octreotide, salmon calcitonin, des-n-octanoyl ghrelin, liraglutide, gastrins. Concepts were mapped to six physiological themes: altered endocrine function, 23.1%; inflammation or cytokine storm, 21.3%; lipid metabolism and atherosclerosis, 17.6%; sympathetic input to blood pressure regulation, 16.7%; altered entry of COVID-19 virus, 14.8%; and unknown, 6.5%.

     
    more » « less
    Free, publicly-accessible full text available September 1, 2024
  9. Parkinson’s disease (PD) is a movement disorder caused by a dopamine deficit in the brain. Current therapies primarily focus on dopamine modulators or replacements, such as levodopa. Although dopamine replacement can help alleviate PD symptoms, therapies targeting the underlying neurodegenerative process are limited. The study objective was to use artificial intelligence to rank the most promising repurposed drug candidates for PD. Natural language processing (NLP) techniques were used to extract text relationships from 33+ million biomedical journal articles from PubMed and map relationships between genes, proteins, drugs, diseases, etc., into a knowledge graph. Cross-domain text mining, hub network analysis, and unsupervised learning rank aggregation were performed in SemNet 2.0 to predict the most relevant drug candidates to levodopa and PD using relevance-based HeteSim scores. The top predicted adjuvant PD therapies included ebastine, an antihistamine for perennial allergic rhinitis; levocetirizine, another antihistamine; vancomycin, a powerful antibiotic; captopril, an angiotensin-converting enzyme (ACE) inhibitor; and neramexane, an N-methyl-D-aspartate (NMDA) receptor agonist. Cross-domain text mining predicted that antihistamines exhibit the capacity to synergistically alleviate Parkinsonian symptoms when used with dopamine modulators like levodopa or levodopa–carbidopa. The relationship patterns among the identified adjuvant candidates suggest that the likely therapeutic mechanism(s) of action of antihistamines for combatting the multi-factorial PD pathology include counteracting oxidative stress, amending the balance of neurotransmitters, and decreasing the proliferation of inflammatory mediators. Finally, cross-domain text mining interestingly predicted a strong relationship between PD and liver disease.

     
    more » « less
    Free, publicly-accessible full text available August 1, 2024
  10. Background: The complex and not yet fully understood etiology of Alzheimer’s disease (AD) shows important proteopathic signs which are unlikely to be linked to a single protein. However, protein subsets from deep proteomic datasets can be useful in stratifying patient risk, identifying stage dependent disease markers, and suggesting possible disease mechanisms. Objective: The objective was to identify protein subsets that best classify subjects into control, asymptomatic Alzheimer’s disease (AsymAD), and AD. Methods: Data comprised 6 cohorts; 620 subjects; 3,334 proteins. Brain tissue-derived predictive protein subsets for classifying AD, AsymAD, or control were identified and validated with label-free quantification and machine learning. Results: A 29-protein subset accurately classified AD (AUC = 0.94). However, an 88-protein subset best predicted AsymAD (AUC = 0.92) or Control (AUC = 0.92) from AD (AUC = 0.98). AD versus Control: APP, DHX15, NRXN1, PBXIP1, RABEP1, STOM, and VGF. AD versus AsymAD: ALDH1A1, BDH2, C4A, FABP7, GABBR2, GNAI3, PBXIP1, and PRKAR1B. AsymAD versus Control: APP, C4A, DMXL1, EXOC2, PITPNB, RABEP1, and VGF. Additional predictors: DNAJA3, PTBP2, SLC30A9, VAT1L, CROCC, PNP, SNCB, ENPP6, HAPLN2, PSMD4, and CMAS. Conclusion: Biomarkers were dynamically separable across disease stages. Predictive proteins were significantly enriched to sugar metabolism. 
    more » « less